89 research outputs found

    Prefix-Projection Global Constraint for Sequential Pattern Mining

    Full text link
    Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets

    Mining Partially-Ordered Sequential Rules Common to Multiple Sequences

    Full text link
    © 2015 IEEE. Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multiple sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining "partially-ordered sequential rules" (POSR), a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding-window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide a much higher prediction accuracy than regular sequential rules for sequence prediction

    A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems

    Full text link
    Domain experts should provide relevant domain knowledge to an Intelligent Tutoring System (ITS) so that it can guide a learner during problemsolving learning activities. However, for many ill-defined domains, the domain knowledge is hard to define explicitly. In previous works, we showed how sequential pattern mining can be used to extract a partial problem space from logged user interactions, and how it can support tutoring services during problem-solving exercises. This article describes an extension of this approach to extract a problem space that is richer and more adapted for supporting tutoring services. We combined sequential pattern mining with (1) dimensional pattern mining (2) time intervals, (3) the automatic clustering of valued actions and (4) closed sequences mining. Some tutoring services have been implemented and an experiment has been conducted in a tutoring system.Comment: Proceedings of the 7th Mexican International Conference on Artificial Intelligence (MICAI 2008), Springer, pp. 765-77

    Techniques for Complex Analysis of Contemporary Data

    Get PDF
    Contemporary data objects are typically complex, semi-structured, or unstructured at all. Besides, objects are also related to form a network. In such a situation, data analysis requires not only the traditional attribute-based access but also access based on similarity as well as data mining operations. Though tools for such operations do exist, they usually specialise in operation and are available for specialized data structures supported by specific computer system environments. In contrary, advance analyses are obtained by application of several elementary access operations which in turn requires expert knowledge in multiple areas. In this paper, we propose a unification platform for various data analytical operators specified as a general-purpose analytical system ADAMiSS. An extensible data-mining and similarity-based set of operators over a common versatile data structure allow the recursive application of heterogeneous operations, thus allowing the definition of complex analytical processes, necessary to solve the contemporary analytical tasks. As a proof-of-concept, we present results that were obtained by our prototype implementation on two real-world data collections: the Twitter Higg's boson and the Kosarak datasets

    Mining attribute evolution rules in dynamic attributed graphs

    Get PDF
    A dynamic attributed graph is a graph that changes over time and where each vertex is described using multiple continuous attributes. Such graphs are found in numerous domains, e.g., social network analysis. Several studies have been done on discovering patterns in dynamic attributed graphs to reveal how attribute(s) change over time. However, many algorithms restrict all attribute values in a pattern to follow the same trend (e.g. increase) and the set of vertices in a pattern to be fixed, while others consider that a single vertex may influence its neighbors. As a result, these algorithms are unable to find complex patterns that show the influence of multiple vertices on many other vertices in terms of several attributes and different trends. This paper addresses this issue by proposing to discover a novel type of patterns called attribute evolution rules (AER). These rules indicate how changes of attribute values of multiple vertices may influence those of others with a high confidence. An efficient algorithm named AER-Miner is proposed to find these rules. Experiments on real data show AER-Miner is efficient and that AERs can provide interesting insights about dynamic attributed graphs

    Discovering High-Utility Itemsets at Multiple Abstraction Levels

    Get PDF
    High-Utility Itemset Mining (HUIM) is a relevant data mining task. The goal is to discover recurrent combinations of items characterized by high prot from transactional datasets. HUIM has a wide range of applications among which market basket analysis and service proling. Based on the observation that items can be clustered into domain-specic categories, a parallel research issue is generalized itemset mining. It entails generating correlations among data items at multiple abstraction levels. The extraction of multiple-level patterns affords new insights into the analyzed data from dierent viewpoints. This paper aims at discovering a novel pattern that combines the expressiveness of generalized and High-Utility itemsets. According to a user-defined taxonomy items are rst aggregated into semantically related categories. Then, a new type of pattern,namely the Generalized High-utility Itemset (GHUI), is extracted. It represents a combinations of items at different granularity levels characterized by high prot (utility). While protable combinations of item categories provide interesting high-level information, GHUIs at lower abstraction levels represent more specic correlationsamong protable items. A single-phase algorithm is proposed to efficiently discover utility itemsets at multiple abstraction levels. The experiments, which were performed on both real and synthetic data, demonstrate the effectiveness and usefulness of the proposed approach

    Learning Behavioral Representations of Human Mobility

    Full text link
    In this paper, we investigate the suitability of state-of-the-art representation learning methods to the analysis of behavioral similarity of moving individuals, based on CDR trajectories. The core of the contribution is a novel methodological framework, mob2vec, centered on the combined use of a recent symbolic trajectory segmentation method for the removal of noise, a novel trajectory generalization method incorporating behavioral information, and an unsupervised technique for the learning of vector representations from sequential data. Mob2vec is the result of an empirical study conducted on real CDR data through an extensive experimentation. As a result, it is shown that mob2vec generates vector representations of CDR trajectories in low dimensional spaces which preserve the similarity of the mobility behavior of individuals.Comment: ACM SIGSPATIAL 2020: 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.November 2020 Seattle, Washington, US
    corecore